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Abstract 



This note merges three discussions written by all or some of the above authors about the 
Read Paper "Particle Markov chain Monte Carlo" by C. Andrieu, A. Doucet, and R. Hollenstein, 
presented on October 16 at the Royal Statistical Society and to be published in the Journal of 
U: the Royal Statistical Society Series B. 



1 Introduction 

We congratulate the three authors for opening such a new vista for running MCMC algorithms 
. in state-space models. Being able to devise a correct Markovian scheme based on a particle ap- 

proximation of the target distribution is a genuine "tour de force" that deserves an enthusiastic 

ON ■ recognition! This is all the more impressive when considering that the ratio 

O ; 

Pe( x i:T\yi:T)/pe{xi.. T (i - l)|yi;r) (11) 

is not unbiased a nd th us invalidate the usual importance sampling solutions, as demonstrated by 

is 



Beaumont et al. I tod ). Thus, the resolution of simulating by conditioning on the lineage truly 



an awesome resolution of the problem! 
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2 Implementation and code 

We implemented the PHM algorithm for the (notoriously challenging) stochastic volatility model 

y t \x t ~ A/"(0, e xt ), x t = n + p{x t -\ - n) + oe t , 

based on several hundred simulated observations. With parameter moves 

//- A%,20- 2 ), 
p*~A/(p,20- 2 ), 
log a* ~ AA(logcr,20- 2 ), 
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Figure 1: Evolution of the (left) parameter simulations for p, p and a, plotted against iteration 
indices and of the (right) estimated acceptance rate of the PMCMC algorithm, obtained N = 10 2 
particles and 10 4 Metropolis-Hastings iterations and a simulated sequence of 500 observations with 
true values po = 1, po = 0.9 and ctq = 0-5. 
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Figure 2: Autocorrelation (left) and pairwise (right) graphs for the p, p and a sequences for the 
same target as in Figure [TJ 
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(a) Parameter values. 



(b) Autocorrelograms. 



Figure 3: Evolution of the (left) parameter simulations for p, p and a, plotted against iteration 
indices, and autocorrelograms for p, p and a, for 100 observations and 100 particles. 



and state-space moves derived from the AR(1) prior, we obtained good mixing properties with no 
calibration effort, using N = 10 2 particles and 10 4 Metropolis-Hastings iterations, as demonstrated 
by Figures [Q and [5J 

The acceptance rate for this configuration, when using a variance of 0.05 2 for each parameter 
move, and 100 particles for 500 observations, was around 25%. With 100 observations and 100 
particles, the results of the PHM algorithm showed a bimodality in the Markov chain as presented 
in Figure El 

The sequence of the p's on the lhs of Figure [3] exhibits concentrations around both 0.9 and —0.9. 
This bimodality of the posterior on p disappears when the number of observations grows, as shown 
by Figures [2] and HI obtained with 1000 simulated observations and 1000 particles. 

Bimodality of the posterior on p could possibly occur for a larger number of observations. It may 
also be an artifact of the simulation method, which would require a higher number of particles or 
of iterations to assess. Unfortunately, this is computationally very demanding: using our Python 
programme on 10, 000 iterations, 1000 observations and 1000 particles requires five hours on a 
mainframe computer. 

Figures [3] and HJ obtained with the same proposal variance, also illustrate the severe decrease 
in the acceptance rate when the number of observations grows: the proposed parameter values get 
rejected for more than 100 iterations in a row on Figur^U 

Our computer program is written in the Python language. It is available for download at 
http://code.google.eom/p/py-pmmh/ and may be adapted to any state-space model by simply 
rewriting two lines of the code, namely those that (a) compute p(yt\xt) and (b) simulate xt+i\xt- 
Contemplating a different model does not even require the calculation of full conditionals, in con- 
trast to Gibbs sampling. Another advantage of the py-pmmh algorithm is that it is trivial to 
parallelise. (Adding a comment before the loop over the particle index is enough, using the OpenMP 
technology.) 
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(a) Parameter values. 



(b) Autocorrelograms. 



Figure 4: Evolution of the (left) parameter simulations for p,, p and a, and autocorrelograms for p, 
p and cr, plotted against iteration indices, for 1000 observations and 1000 particles. 



3 Recycling 

Although this is already addressed in the last section of the paper, we mention here possible options 
for a better recycling of the numerous simulations produced by the algorithm. This dimension of 
the algorithm deserves deeper study, maybe to the extent of allowing fo r a finite time hor izon 
overcoming the MCMC nature of the algorithm, as in the PMC solution of Cappe et al. ( 20081 ). 

A more straightforward remark is that, due to the additional noise brought by the resampling 
mechanism, more stable recycling would be produced both in the individual weight s ui n {Xi- n ) 
by Rao-Blackwellisation of the denominator in eqn. (7) as in llacobucci et al.l (120091 ) and over 



past iterations by a complete reweig hting scheme like AMIS (|Cornuet et all 120091 )." Another ob- 



vious question is whether or not the exploitation of the wealth of i nformation provided by the 



population simulations is ma nageable via adaptive MCMC methods (jAndrieu and Robert) . 12001 
Roberts and Rosenthal . 20091 ) 



4 Model choice 

Since 

T 

Pe{yi:T) = pe{y\) Y[pe(y n \yi: 



n-lj 



is an unbiased estimator of po(yv.T), there must be direct implications of the method towards 
deriving better model ch o ice st rategies in state-space models, as exemplified in Population Monte 
Carlo by iKilbinger et al.1 (|2009h in a cosmology setting. 



In fact, the paper does not cover the corresponding calculation of the marginal likelihood p(y), 
the central quantity in model choice. However, the PMCMC approach seems to lend itself naturally 
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to the use of Chib's ( 19951 ) estimate, 



i.e. 



p(y) 



p(o*)p(v\ 



p{0*\y) 

for any 9*. Provided the p(9\x, y) density allows for a closed-form expression, the denominator may 
be estimated by 



p(6\y) 



1 



M 



p(9\x, y)p(x\y) dx ~jj ^2p(8\x 

i=i 



Xi,y) , 



where the Xj's, i = 1, ...,M, are provided by the MCMC output. 

The novelty in Andrieu et al. (2009) is that p{y\9) in the numerator needs to be evaluated as 
well. Fortunately, as pointed out above, each iteration i of the PMCMC sampler provides a Monte 
Carlo estimate of p(y\9 = 9i), where 9{ is the current parameter value. Some care may be required 
when choosing 9*; e.g. selecting the 9* with largest (evaluated) likelihood may lead to a biased 
estimator. 



We did some experiment s in orde r to co mpare th e approach described abov e with INLA (jRue et al 



2009h and nested sampling (jSkillintd . 120071 ; see also IChopin and Robert! . 120071 ) , using the stochastic 



volatility example of Rue et al. (2009). Unfortunately, our PMCMC program requires more than 
one day to complete (for a number N of particles and a number M of iterations that are sufficient 
for reasonable performance), so we were unable to include the results in this discussion. A likely 
explanation is that the cost of PMCMC is at least 0(T 2 ), where T is the sample size (T = 945 in 
this example), since, according to the authors, good performance requires than N = 0(T), but our 
implementation may be sub-optimal as well. 

Interestingly, nested sampling perfor ms reasonab l y wel l on this example (reasonable error ob- 



tained in one hour), and, as reported by Rue et al. ( 20091 ). the INLA approximation is fast (one 



second) and very accurate, but more work is required for a more meaningful comparison. 



5 Connections with some of the earlier literature 



Two interesting metrics for the impact of a Read Paper are: (a) the number of previous papers 
it impacts in some way; and (b) the number of interesting theoretical questions it opens. In both 
respects, this paper fares very well. 

Regarding (a), in many complicated models the only tractable opera tions are state filter i ng an d 
likelihood evaluation, as shown, e.g., in the continuous-time model of Chopin and Varini ( 20071 ). 
In such settin gs, the PHM a lgorithm offers Bayesian estimates " for free" , which is very nice. 

Similarly, Chopin ( 20071 ) . see also Fearnhead and Liu ( 20071 ) . formulates change-point models 
as state-space models, where the state xt = (9t,dt), comprises the current parameter 9t and the 
time since the last change dt- In this setting, one may use SMC to recover the trajectory Xi-t, i.e., 
all the change dates and parameter values. It works well when xt forgets about its past quickly 
enough, but this forbids hierarchical priors for the durations and the parameters. PHM removes 
this limitation: Chopin's (2007) SMC algorithm may indeed be embedded into a PHM algorithm, 
where each iteration corresponds to a different value of the hyper-parameters. This comes as a cost 
however, as each MCMC iteration need run a complete SMC algorithm. 

Regarding point (b), several questions, which have already been answered in the standard SMC 
case, may again be asked about PMCMC: Does residual resampling outperform multinomial resam- 
pling? Is the algorithm with N + 1 particles strictly better than the one with N particles? What 
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about Rao-Blackwellisation, or the choice of the proposal distribution? One technical difficulty 
is that marginalising out components always reduces the variance in SMC, but not in MCMC. 
Another difficulty is that PMCMC retains only one particle trajectory xi : t, hence the impact in 
reducing variability between particles is less immediate. 

Similarly, obtaining a single trajectory x\-t from a forward filter is certainly much easier than 
obtaining many of them, but this may still be quite demanding in some scenarios, i.e., there may 
be so much degeneracy in x\ that not a single particle does contain a x\ that is in the support of 
p{xi\y 1:T )- 
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